perf: remove ExecutionCtx from OperationsVTable::scalar_at#7417
perf: remove ExecutionCtx from OperationsVTable::scalar_at#7417
ExecutionCtx from OperationsVTable::scalar_at#7417Conversation
The scalar_at hot path was creating and dropping an ExecutionCtx on every call via LEGACY_SESSION.create_execution_ctx(). Each creation involves an atomic fetch_add + Arc clone, and each drop an Arc release. This was called millions of times during query execution (primarily from PrimitiveTyped::value_unchecked during search_sorted and patches). Only Patched::scalar_at actually uses ctx (for .execute()), so move the ctx creation there. All other implementations had unused `_ctx` params. Profiled with apmc (Apple Silicon hardware performance counters). ClickBench (Vortex format, Apple Silicon M4 Max, 3 iterations): Metric Before After Delta Cycles 1,973.5B 1,942.6B -1.6% Instructions 4,730.2B 4,722.5B -0.16% IPC 2.40 2.43 +1.25% Wall clock 43.09s 41.92s -2.7% Dispatch stalls 593.7B 573.9B -3.3% L1D cache misses 80.7B 80.6B ~0% Branch mispredicts 11.2B 11.2B ~0% Signed-off-by: Alexander Droste <droste.alexander@gmail.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
ExecutionCtx from OperationsVTable::scalar_at
Merging this PR will degrade performance by 22.79%
Performance Changes
Comparing Footnotes
|
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.990x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.994x ➖, 0↑ 0↓)
datafusion / parquet (1.041x ➖, 0↑ 3↓)
datafusion / arrow (1.081x ➖, 0↑ 8↓)
duckdb / vortex-file-compressed (0.988x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.994x ➖, 0↑ 0↓)
duckdb / parquet (0.942x ➖, 6↑ 1↓)
duckdb / duckdb (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.007x ➖ datafusion / vortex-file-compressed (1.007x ➖, 0↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.059x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.020x ➖, 0↑ 0↓)
datafusion / parquet (1.023x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.993x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.002x ➖, 0↑ 1↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.999x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.001x ➖, 0↑ 2↓)
datafusion / parquet (1.000x ➖, 1↑ 1↓)
duckdb / vortex-file-compressed (0.992x ➖, 4↑ 1↓)
duckdb / vortex-compact (1.000x ➖, 2↑ 2↓)
duckdb / parquet (0.997x ➖, 0↑ 1↓)
duckdb / duckdb (0.999x ➖, 1↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.988x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
datafusion / arrow (1.002x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.996x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (0.998x ➖, 0↑ 0↓)
duckdb / duckdb (1.001x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.928x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.967x ➖, 0↑ 0↓)
duckdb / parquet (0.959x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.007x ➖, 0↑ 1↓)
datafusion / parquet (0.999x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (1.025x ➖, 1↑ 5↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
duckdb / duckdb (1.021x ➖, 0↑ 4↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (101 files changed, -33.3% overall, 0↑ 101↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.071x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.037x ➖, 0↑ 0↓)
datafusion / parquet (0.958x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.981x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.985x ➖, 1↑ 0↓)
duckdb / parquet (1.022x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.975x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.935x ➖, 0↑ 0↓)
datafusion / parquet (1.024x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.028x ➖, 0↑ 0↓)
duckdb / parquet (0.992x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.025x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.972x ➖, 0↑ 0↓)
datafusion / parquet (0.959x ➖, 2↑ 1↓)
duckdb / vortex-file-compressed (1.072x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.997x ➖, 0↑ 0↓)
duckdb / parquet (0.979x ➖, 0↑ 0↓)
Full attributed analysis
|
patch_indices is always Primitive after execution (enforced by require_child! in Patched::execute), and PrimitiveArray::slice returns a PrimitiveArray directly. Replace the optimize + execute chain with a zero-cost try_downcast, removing the last LEGACY_SESSION.create_execution_ctx() from the scalar_at hot path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
We did in |
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
|
As discussed offline with @joseph-isaacs, we for now want to keep the API as is. |
|
Sorry to close this. We in the future highly likely want to use the execution context thorough out execution. I think for now we should accept this perf loss. We can re-examine this in the future. |
Summary
ExecutionCtxinscalar_atProfiling
Profiled with apmc. ClickBench, Vortex format, Apple Silicon M4 Max, 3 iterations: